Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with 'HandleLLVMOptions' and 'LLVM_DEFINITIONS' #125779

Open
christopherbate opened this issue Feb 4, 2025 · 1 comment
Open

Issues with 'HandleLLVMOptions' and 'LLVM_DEFINITIONS' #125779

christopherbate opened this issue Feb 4, 2025 · 1 comment
Labels
cmake Build system in general and CMake in particular question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@christopherbate
Copy link
Contributor

christopherbate commented Feb 4, 2025

TLDR:

  • Some projects use the sequence find_project(LLVM) ; include(HandleLLVMOptions) to set directory-scoped cMake compilation options based on CMake variables provided by LLVMConfig.cmake. This includes compiler options, linker options, and compiler definitions.
  • HandleLLVMOptions will then also set LLVM_DEFINITIONS to the set of definitions in the top-level CMake scope.
  • Downstream projects are also invoking add_definitions(${LLVM_DEFINITIONS}) but they are unaware that a) LLVM_DEFINITIONS is a space-separated list and needs to be pre-processed and b) has different effects depending on ordering with HandleLLVMOptions and c) can cause duplication of compiler definitions
@EugeneZelenko EugeneZelenko added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! cmake Build system in general and CMake in particular and removed new issue labels Feb 4, 2025
@christopherbate
Copy link
Contributor Author

christopherbate commented Feb 5, 2025

Clarification on the issue after some investigation:

What is recommended by the docs at https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project is to invoke

  find_package(LLVM CONFIG REQUIRED)

  include_directories(${LLVM_INCLUDE_DIRS})
  separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
  add_definitions(${LLVM_DEFINITIONS_LIST})

The find_package defines LLVM_DEFINITIONS by way of the installed or configured LLVMConfig.cmake file.

However, what MLIR projects are doing is (as recommended in MLIR Standalone example)

  find_package(MLIR CONFIG REQUIRED) # implicitly also does find_package(LLVM ...)
  
  include(HandleLLVMOptions)

  include_directories(${LLVM_INCLUDE_DIRS})
  separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
  add_definitions(${LLVM_DEFINITIONS_LIST})

The HandleLLVMOptions call is overriding LLVM_DEFINITIONS with a new value. Behavior is different depending on placement of include(HandleLLVMOptions).

So I think either HandleLLVMOptions was not intended to be used out-of-tree, but many projects are using it. Perhaps guidance should be updated to below:

If building against installed/pre-packaged LLVM

  find_package(LLVM CONFIG REQUIRED)

  include_directories(${LLVM_INCLUDE_DIRS})
  separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
  add_definitions(${LLVM_DEFINITIONS_LIST})

  # Optional, if wanting to use LLVM-style CMake compilation flags in your project (e.g. LLVM_USE_LINKER, LLVM_USE_SANITIZER, etc)
 # Must come AFTER any uses of `LLVM_DEFINITIONS` since
 # this will override it.
 # This likely will cause  some duplication of compiler definitions.
  include(HandleLLVMOptions)

If embedding LLVM as a sub-project

  add_subdirectory(${LLVM_PROJECT_SRC_DIR}/llvm ${LLVM_BINARY_DIR})

  find_package(LLVM CONFIG REQUIRED 
    HINTS ${LLVM_BINARY_DIR}/lib/cmake/llvm
  )
  
  # These are only needed for targets that depend on LLVM targets.
  include_directories(${LLVM_INCLUDE_DIRS})
  separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
  add_definitions(${LLVM_DEFINITIONS_LIST})
  
  # Optional, if wanting to use LLVM-style CMake compilation flags in your project (e.g. LLVM_USE_LINKER, LLVM_USE_SANITIZER, etc)
 # Must come AFTER any uses of `LLVM_DEFINITIONS` since
 # this will override it.
 # This likely will cause  some duplication of compiler definitions.
  include(HandleLLVMOptions) 

@christopherbate christopherbate changed the title Can recommendation for out-of-tree projects to add_definitions(${LLVM_DEFINITIONS}) be retired? Issues with 'HandleLLVMOptions' and 'LLVM_DEFINITIONS' Feb 15, 2025
christopherbate added a commit to christopherbate/stablehlo that referenced this issue Feb 15, 2025
This change attempts to resolve issues with use of `HandleLLVMOptions`
and `LLVM_DEFINITIONS`, see
llvm/llvm-project#125779.

Note that this is a breaking change because it could cause build
breakage for downstream users. As noted in the comments added to the
CMakeLists.txt file, there may not be one perfect CMake incantation
for setting Stablehlo's options that works for all users.

Since it's easier to *add* compiler options at a specific scope than it is
to alter/remove options that Stablehlo itself is setting, this change
is hoisting responsibility to the user for setting any compiler
options previously provided by the `HandleLLVMOptions` call when
building in embedded mode.

This means that if user was using
`FetchContent|add_subdirectory|CPMAddPackage` to build Stablehlo
in their project, they should invoke

```
find_package(LLVM CONFIG REQUIRED)
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include(HandleLLVMOptions)
```

in their project at the appropriate scope, or set desired flags in some
other manner.
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Mar 4, 2025
Changes are listed oldest-to-newest:

0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate <[email protected]>:

[cmake] Fix macro redefinition issues

When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.

It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779

In our case, it only has the effect of duplicating options and therefore
shouldn't be called.

--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate <[email protected]>:

[cmake] Retire some build system logic that is no longer necessary

We used to try to do the "right" thing and make targets carry compilation
definitions, but we instead use CMake directory level definitions now.

Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use `LLVM_USE_LINKER`.

--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang <[email protected]>:

[python/CAPI] Add Python API to get device name like cuda:0

1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp

The reason to only get index but not string in Runtime.cpp because passing in/out string
ptrs can be complicated (which may require allocating the string buffer and estimating its size beforehand).

Test is added:
```
    devices = client.get_devices()
    print(devices[0].get_name())
```

--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao <[email protected]>:

[executor][python]: Fix runtime API for creating float/integer scalars.

Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.

--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne <[email protected]>:

[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins

Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.

Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.

Signed-off-by: Samurdhi Karunaratne <[email protected]>

--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate <[email protected]>:

[cmake] Extensive updates to dependency management

Makes a big update to how dependencies are declared in the CMake
code:

Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage
to declare their dependencies in some manner, and they do so via any
number of means, and the top-level project typically doesn't have any
control over that besides manually patching the upstream project's code.

To clean this all up into one coherent interface, we can declare a CMake
[Dependency Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.

The dependency provider is a top-level CMake script which can only
be declared by the top-level project. It allows us intercept any and
every `find_package` or `FetchContent_MakeAvailable` call in the whole
build, including for dependencies.

Using this mechanism, in our core CMakeLists.txt, and in all the dependencies'
CMakeLists, we just need to use `find_package(<name> ...)` to declare
dependencies. The Dependency Provider then intercepts those `find_package`
calls and satisfies the depdency however we like.

To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded immediately),
and they are only actually downloaded and incorporated into the build
when the `find_package` call is made.

In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts
to CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field
to indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.

--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate <[email protected]>:

[cmake] Fix library search directory for TensorRT dialect LIT tests

Fixes the search directory for dynamic libraries, resolving
build failure when BUILD_SHARED_LIBS=ON.

--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate <[email protected]>:

[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build issue

Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with BUILD_SHARED_LIBS=ON.
This change factors out a "base" library for the TensorRT dialect
that resolves the circular dependency issue.

--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke <[email protected]>:

[compiler/lib/Conversion] Add support for `stablehlo.custom_call` conversion

This MR adds a converter for `stablehlo.custom_call` op conversion to TensorRT.
Currently only valid conversion is when call target is `mhlo.topk` and more
possible conversions can be added later.
MLIR tests are added.

--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate <[email protected]>:

NFC: Fix incorrect header in NvInferPluginUtils

Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).

--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate <[email protected]>:

[executor] Update 'allocs-to-globals' pass to use DataLayout for type size calculation

Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.

--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate <[email protected]>:

[compiler] Cleanup pipeline declarations in Plan Dialect Transforms

This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations, deallocation)
into a single pipeline declaration that is used by the different top-level
pipelines (e.g. linalt-to-executable, stablehlo-to-executable, and so on).

--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate <[email protected]>:

[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression tests

Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we
are producing sub-optimal code related to host constants and shape
values.

--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate <[email protected]>:

NFC: fix missing FileCheck command in test added in 6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2

--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate <[email protected]>:

[compiler] Use bufferization analysis in 'plan-alloc-tensors'

This change reworks the parts of the 'plan-alloc-tensors' pass that tried
to establish DPS connectivity in loop regions and at function boundaries.
We now use existing routines in the bufferization analysis to support
the transformation, which results in better results and more simplified
code. Additionally, a bug is closed where the transform could potentially
try modifying functions without fixing the callers if inlining was not
performed. We now explicitly check to make sure the functions we are
modifying do not have any callers and that we do not traverse nested
symbol tables. A future change can update the pass to allow modifying
functions with callers.

--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate <[email protected]>:

[compiler] Add new module bufferization pass

This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler' project.

The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.

We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like operations
outside of `func.func` and `func.call` (support for other operations using
the appropriate interfaces has had trouble landing upstream). In addition,
the upstream pass does not take into account the potential nesting of
modules/symbol-tables where nested modules may require different
bufferization options than the parent module and functions may be called
from a parent module (if using call-like operations other than `func.call`).

To allow dynamic querying of bufferization options from modules, we create
a new interface, `BufferizationScopeOpInterface` that lets us attach
methods for producing OneShotBufferizationOptions to our custom module-like
operations.

This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering constraint
-- nested modules had to be bufferized prior to outer modules.

--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate <[email protected]>:

[executor] Fix Executor flatbuffer C++ generation cmake function

This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++ flatbuffer
files.

--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate <[email protected]>:

[executor] Bump Flatbuffers to v25.2.10

Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.

--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate <[email protected]>:

[executor] Set max Flatbuffer verification size to the 64bit limit

Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.

--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate <[email protected]>:

[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default

This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of `plan-bufferize`.
Certain integration tests for generation models won't work without it
currently.

--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate <[email protected]>:

[tensorrt] Add a size limit to constant folding in 'tensorrt-transpose-elimination'

Constant folding of transpose operations in the `tensorrt-transpose-elimination`
dialect can be extremely costly for large weights (e.g. embedding matrix).
We need to impose an upper bound on what constant folding we will allow
using naive transposition of DenseElementsAttrs.

--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate <[email protected]>:

[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination' pass

Add a simple statistic (only set when built with assertions enabled) that
counts the number of eliminated broadcast operations.

--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate <[email protected]>:

[tensorrt] Place an upper bound on constant folding computation

Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.

--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke <[email protected]>:

[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce` conversion to TensorRT

Previously, `stablehlo.reduce` conversion to TensorRT had the
following restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.

It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.

Stablehlo to TensorRT conversion tests are updated and a tensorrt
test is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.

--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate <[email protected]>:

[compiler] Fix failing `test_tensorrt_add.py` after importing OSS changes

Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.

--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate <[email protected]>:

NFC: [executor] cleanup TensorRTModule output allocator logic

Updates TensorRT output allocator logic to remove "reallocation" features
which were not usable. As we currently use it, the output allocator lazily
allocates new results for each invocation. In order to improve allocator
performance in the future, we should:

a) Pre-allocate when we know the static dimensions of the output
b) Represent the allocator in the IR so that it can be hoisted out of loops,
   allowing the outputs of TRT engines whose results don't escape the function
   to be re-used without an additional allocation.

Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>

GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Mar 4, 2025
Changes are listed oldest-to-newest:

0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate <[email protected]>:

[cmake] Fix macro redefinition issues

When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.

It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779

In our case, it only has the effect of duplicating options and therefore
shouldn't be called.

--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate <[email protected]>:

[cmake] Retire some build system logic that is no longer necessary

We used to try to do the "right" thing and make targets carry compilation
definitions, but we instead use CMake directory level definitions now.

Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use `LLVM_USE_LINKER`.

--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang <[email protected]>:

[python/CAPI] Add Python API to get device name like cuda:0

1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp

The reason to only get index but not string in Runtime.cpp because passing in/out string
ptrs can be complicated (which may require allocating the string buffer and estimating its size beforehand).

Test is added:
```
    devices = client.get_devices()
    print(devices[0].get_name())
```

--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao <[email protected]>:

[executor][python]: Fix runtime API for creating float/integer scalars.

Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.

--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne <[email protected]>:

[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins

Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.

Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.

Signed-off-by: Samurdhi Karunaratne <[email protected]>

--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate <[email protected]>:

[cmake] Extensive updates to dependency management

Makes a big update to how dependencies are declared in the CMake
code:

Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage
to declare their dependencies in some manner, and they do so via any
number of means, and the top-level project typically doesn't have any
control over that besides manually patching the upstream project's code.

To clean this all up into one coherent interface, we can declare a CMake
[Dependency Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.

The dependency provider is a top-level CMake script which can only
be declared by the top-level project. It allows us intercept any and
every `find_package` or `FetchContent_MakeAvailable` call in the whole
build, including for dependencies.

Using this mechanism, in our core CMakeLists.txt, and in all the dependencies'
CMakeLists, we just need to use `find_package(<name> ...)` to declare
dependencies. The Dependency Provider then intercepts those `find_package`
calls and satisfies the depdency however we like.

To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded immediately),
and they are only actually downloaded and incorporated into the build
when the `find_package` call is made.

In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts
to CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field
to indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.

--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate <[email protected]>:

[cmake] Fix library search directory for TensorRT dialect LIT tests

Fixes the search directory for dynamic libraries, resolving
build failure when BUILD_SHARED_LIBS=ON.

--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate <[email protected]>:

[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build issue

Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with BUILD_SHARED_LIBS=ON.
This change factors out a "base" library for the TensorRT dialect
that resolves the circular dependency issue.

--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke <[email protected]>:

[compiler/lib/Conversion] Add support for `stablehlo.custom_call` conversion

This MR adds a converter for `stablehlo.custom_call` op conversion to TensorRT.
Currently only valid conversion is when call target is `mhlo.topk` and more
possible conversions can be added later.
MLIR tests are added.

--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate <[email protected]>:

NFC: Fix incorrect header in NvInferPluginUtils

Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).

--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate <[email protected]>:

[executor] Update 'allocs-to-globals' pass to use DataLayout for type size calculation

Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.

--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate <[email protected]>:

[compiler] Cleanup pipeline declarations in Plan Dialect Transforms

This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations, deallocation)
into a single pipeline declaration that is used by the different top-level
pipelines (e.g. linalt-to-executable, stablehlo-to-executable, and so on).

--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate <[email protected]>:

[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression tests

Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we
are producing sub-optimal code related to host constants and shape
values.

--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate <[email protected]>:

NFC: fix missing FileCheck command in test added in 6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2

--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate <[email protected]>:

[compiler] Use bufferization analysis in 'plan-alloc-tensors'

This change reworks the parts of the 'plan-alloc-tensors' pass that tried
to establish DPS connectivity in loop regions and at function boundaries.
We now use existing routines in the bufferization analysis to support
the transformation, which results in better results and more simplified
code. Additionally, a bug is closed where the transform could potentially
try modifying functions without fixing the callers if inlining was not
performed. We now explicitly check to make sure the functions we are
modifying do not have any callers and that we do not traverse nested
symbol tables. A future change can update the pass to allow modifying
functions with callers.

--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate <[email protected]>:

[compiler] Add new module bufferization pass

This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler' project.

The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.

We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like operations
outside of `func.func` and `func.call` (support for other operations using
the appropriate interfaces has had trouble landing upstream). In addition,
the upstream pass does not take into account the potential nesting of
modules/symbol-tables where nested modules may require different
bufferization options than the parent module and functions may be called
from a parent module (if using call-like operations other than `func.call`).

To allow dynamic querying of bufferization options from modules, we create
a new interface, `BufferizationScopeOpInterface` that lets us attach
methods for producing OneShotBufferizationOptions to our custom module-like
operations.

This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering constraint
-- nested modules had to be bufferized prior to outer modules.

--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate <[email protected]>:

[executor] Fix Executor flatbuffer C++ generation cmake function

This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++ flatbuffer
files.

--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate <[email protected]>:

[executor] Bump Flatbuffers to v25.2.10

Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.

--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate <[email protected]>:

[executor] Set max Flatbuffer verification size to the 64bit limit

Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.

--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate <[email protected]>:

[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default

This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of `plan-bufferize`.
Certain integration tests for generation models won't work without it
currently.

--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate <[email protected]>:

[tensorrt] Add a size limit to constant folding in 'tensorrt-transpose-elimination'

Constant folding of transpose operations in the `tensorrt-transpose-elimination`
dialect can be extremely costly for large weights (e.g. embedding matrix).
We need to impose an upper bound on what constant folding we will allow
using naive transposition of DenseElementsAttrs.

--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate <[email protected]>:

[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination' pass

Add a simple statistic (only set when built with assertions enabled) that
counts the number of eliminated broadcast operations.

--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate <[email protected]>:

[tensorrt] Place an upper bound on constant folding computation

Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.

--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke <[email protected]>:

[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce` conversion to TensorRT

Previously, `stablehlo.reduce` conversion to TensorRT had the
following restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.

It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.

Stablehlo to TensorRT conversion tests are updated and a tensorrt
test is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.

--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate <[email protected]>:

[compiler] Fix failing `test_tensorrt_add.py` after importing OSS changes

Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.

--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate <[email protected]>:

NFC: [executor] cleanup TensorRTModule output allocator logic

Updates TensorRT output allocator logic to remove "reallocation" features
which were not usable. As we currently use it, the output allocator lazily
allocates new results for each invocation. In order to improve allocator
performance in the future, we should:

a) Pre-allocate when we know the static dimensions of the output
b) Represent the allocator in the IR so that it can be hoisted out of loops,
   allowing the outputs of TRT engines whose results don't escape the function
   to be re-used without an additional allocation.

Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>

GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Mar 4, 2025
Changes are listed oldest-to-newest:

0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate <[email protected]>:

[cmake] Fix macro redefinition issues

When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.

It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779

In our case, it only has the effect of duplicating options and therefore
shouldn't be called.

--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate <[email protected]>:

[cmake] Retire some build system logic that is no longer necessary

We used to try to do the "right" thing and make targets carry compilation
definitions, but we instead use CMake directory level definitions now.

Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use `LLVM_USE_LINKER`.

--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang <[email protected]>:

[python/CAPI] Add Python API to get device name like cuda:0

1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp

The reason to only get index but not string in Runtime.cpp because passing in/out string
ptrs can be complicated (which may require allocating the string buffer and estimating its size beforehand).

Test is added:
```
    devices = client.get_devices()
    print(devices[0].get_name())
```

--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao <[email protected]>:

[executor][python]: Fix runtime API for creating float/integer scalars.

Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.

--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne <[email protected]>:

[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins

Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.

Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.

Signed-off-by: Samurdhi Karunaratne <[email protected]>

--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate <[email protected]>:

[cmake] Extensive updates to dependency management

Makes a big update to how dependencies are declared in the CMake
code:

Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage
to declare their dependencies in some manner, and they do so via any
number of means, and the top-level project typically doesn't have any
control over that besides manually patching the upstream project's code.

To clean this all up into one coherent interface, we can declare a CMake
[Dependency Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.

The dependency provider is a top-level CMake script which can only
be declared by the top-level project. It allows us intercept any and
every `find_package` or `FetchContent_MakeAvailable` call in the whole
build, including for dependencies.

Using this mechanism, in our core CMakeLists.txt, and in all the dependencies'
CMakeLists, we just need to use `find_package(<name> ...)` to declare
dependencies. The Dependency Provider then intercepts those `find_package`
calls and satisfies the depdency however we like.

To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded immediately),
and they are only actually downloaded and incorporated into the build
when the `find_package` call is made.

In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts
to CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field
to indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.

--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate <[email protected]>:

[cmake] Fix library search directory for TensorRT dialect LIT tests

Fixes the search directory for dynamic libraries, resolving
build failure when BUILD_SHARED_LIBS=ON.

--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate <[email protected]>:

[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build issue

Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with BUILD_SHARED_LIBS=ON.
This change factors out a "base" library for the TensorRT dialect
that resolves the circular dependency issue.

--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke <[email protected]>:

[compiler/lib/Conversion] Add support for `stablehlo.custom_call` conversion

This MR adds a converter for `stablehlo.custom_call` op conversion to TensorRT.
Currently only valid conversion is when call target is `mhlo.topk` and more
possible conversions can be added later.
MLIR tests are added.

--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate <[email protected]>:

NFC: Fix incorrect header in NvInferPluginUtils

Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).

--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate <[email protected]>:

[executor] Update 'allocs-to-globals' pass to use DataLayout for type size calculation

Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.

--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate <[email protected]>:

[compiler] Cleanup pipeline declarations in Plan Dialect Transforms

This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations, deallocation)
into a single pipeline declaration that is used by the different top-level
pipelines (e.g. linalt-to-executable, stablehlo-to-executable, and so on).

--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate <[email protected]>:

[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression tests

Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we
are producing sub-optimal code related to host constants and shape
values.

--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate <[email protected]>:

NFC: fix missing FileCheck command in test added in 6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2

--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate <[email protected]>:

[compiler] Use bufferization analysis in 'plan-alloc-tensors'

This change reworks the parts of the 'plan-alloc-tensors' pass that tried
to establish DPS connectivity in loop regions and at function boundaries.
We now use existing routines in the bufferization analysis to support
the transformation, which results in better results and more simplified
code. Additionally, a bug is closed where the transform could potentially
try modifying functions without fixing the callers if inlining was not
performed. We now explicitly check to make sure the functions we are
modifying do not have any callers and that we do not traverse nested
symbol tables. A future change can update the pass to allow modifying
functions with callers.

--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate <[email protected]>:

[compiler] Add new module bufferization pass

This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler' project.

The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.

We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like operations
outside of `func.func` and `func.call` (support for other operations using
the appropriate interfaces has had trouble landing upstream). In addition,
the upstream pass does not take into account the potential nesting of
modules/symbol-tables where nested modules may require different
bufferization options than the parent module and functions may be called
from a parent module (if using call-like operations other than `func.call`).

To allow dynamic querying of bufferization options from modules, we create
a new interface, `BufferizationScopeOpInterface` that lets us attach
methods for producing OneShotBufferizationOptions to our custom module-like
operations.

This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering constraint
-- nested modules had to be bufferized prior to outer modules.

--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate <[email protected]>:

[executor] Fix Executor flatbuffer C++ generation cmake function

This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++ flatbuffer
files.

--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate <[email protected]>:

[executor] Bump Flatbuffers to v25.2.10

Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.

--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate <[email protected]>:

[executor] Set max Flatbuffer verification size to the 64bit limit

Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.

--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate <[email protected]>:

[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default

This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of `plan-bufferize`.
Certain integration tests for generation models won't work without it
currently.

--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate <[email protected]>:

[tensorrt] Add a size limit to constant folding in 'tensorrt-transpose-elimination'

Constant folding of transpose operations in the `tensorrt-transpose-elimination`
dialect can be extremely costly for large weights (e.g. embedding matrix).
We need to impose an upper bound on what constant folding we will allow
using naive transposition of DenseElementsAttrs.

--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate <[email protected]>:

[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination' pass

Add a simple statistic (only set when built with assertions enabled) that
counts the number of eliminated broadcast operations.

--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate <[email protected]>:

[tensorrt] Place an upper bound on constant folding computation

Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.

--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke <[email protected]>:

[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce` conversion to TensorRT

Previously, `stablehlo.reduce` conversion to TensorRT had the
following restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.

It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.

Stablehlo to TensorRT conversion tests are updated and a tensorrt
test is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.

--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate <[email protected]>:

[compiler] Fix failing `test_tensorrt_add.py` after importing OSS changes

Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.

--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate <[email protected]>:

NFC: [executor] cleanup TensorRTModule output allocator logic

Updates TensorRT output allocator logic to remove "reallocation" features
which were not usable. As we currently use it, the output allocator lazily
allocates new results for each invocation. In order to improve allocator
performance in the future, we should:

a) Pre-allocate when we know the static dimensions of the output
b) Represent the allocator in the IR so that it can be hoisted out of loops,
   allowing the outputs of TRT engines whose results don't escape the function
   to be re-used without an additional allocation.

Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>

GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Mar 5, 2025
Changes are listed oldest-to-newest:

0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate <[email protected]>:

[cmake] Fix macro redefinition issues

When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.

It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779

In our case, it only has the effect of duplicating options and therefore
shouldn't be called.

--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate <[email protected]>:

[cmake] Retire some build system logic that is no longer necessary

We used to try to do the "right" thing and make targets carry compilation
definitions, but we instead use CMake directory level definitions now.

Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use `LLVM_USE_LINKER`.

--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang <[email protected]>:

[python/CAPI] Add Python API to get device name like cuda:0

1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp

The reason to only get index but not string in Runtime.cpp because passing in/out string
ptrs can be complicated (which may require allocating the string buffer and estimating its size beforehand).

Test is added:
```
    devices = client.get_devices()
    print(devices[0].get_name())
```

--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao <[email protected]>:

[executor][python]: Fix runtime API for creating float/integer scalars.

Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.

--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne <[email protected]>:

[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins

Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.

Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.

Signed-off-by: Samurdhi Karunaratne <[email protected]>

--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate <[email protected]>:

[cmake] Extensive updates to dependency management

Makes a big update to how dependencies are declared in the CMake
code:

Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage
to declare their dependencies in some manner, and they do so via any
number of means, and the top-level project typically doesn't have any
control over that besides manually patching the upstream project's code.

To clean this all up into one coherent interface, we can declare a CMake
[Dependency Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.

The dependency provider is a top-level CMake script which can only
be declared by the top-level project. It allows us intercept any and
every `find_package` or `FetchContent_MakeAvailable` call in the whole
build, including for dependencies.

Using this mechanism, in our core CMakeLists.txt, and in all the dependencies'
CMakeLists, we just need to use `find_package(<name> ...)` to declare
dependencies. The Dependency Provider then intercepts those `find_package`
calls and satisfies the depdency however we like.

To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded immediately),
and they are only actually downloaded and incorporated into the build
when the `find_package` call is made.

In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts
to CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field
to indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.

--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate <[email protected]>:

[cmake] Fix library search directory for TensorRT dialect LIT tests

Fixes the search directory for dynamic libraries, resolving
build failure when BUILD_SHARED_LIBS=ON.

--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate <[email protected]>:

[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build issue

Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with BUILD_SHARED_LIBS=ON.
This change factors out a "base" library for the TensorRT dialect
that resolves the circular dependency issue.

--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke <[email protected]>:

[compiler/lib/Conversion] Add support for `stablehlo.custom_call` conversion

This MR adds a converter for `stablehlo.custom_call` op conversion to TensorRT.
Currently only valid conversion is when call target is `mhlo.topk` and more
possible conversions can be added later.
MLIR tests are added.

--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate <[email protected]>:

NFC: Fix incorrect header in NvInferPluginUtils

Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).

--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate <[email protected]>:

[executor] Update 'allocs-to-globals' pass to use DataLayout for type size calculation

Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.

--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate <[email protected]>:

[compiler] Cleanup pipeline declarations in Plan Dialect Transforms

This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations, deallocation)
into a single pipeline declaration that is used by the different top-level
pipelines (e.g. linalt-to-executable, stablehlo-to-executable, and so on).

--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate <[email protected]>:

[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression tests

Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we
are producing sub-optimal code related to host constants and shape
values.

--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate <[email protected]>:

NFC: fix missing FileCheck command in test added in 6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2

--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate <[email protected]>:

[compiler] Use bufferization analysis in 'plan-alloc-tensors'

This change reworks the parts of the 'plan-alloc-tensors' pass that tried
to establish DPS connectivity in loop regions and at function boundaries.
We now use existing routines in the bufferization analysis to support
the transformation, which results in better results and more simplified
code. Additionally, a bug is closed where the transform could potentially
try modifying functions without fixing the callers if inlining was not
performed. We now explicitly check to make sure the functions we are
modifying do not have any callers and that we do not traverse nested
symbol tables. A future change can update the pass to allow modifying
functions with callers.

--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate <[email protected]>:

[compiler] Add new module bufferization pass

This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler' project.

The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.

We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like operations
outside of `func.func` and `func.call` (support for other operations using
the appropriate interfaces has had trouble landing upstream). In addition,
the upstream pass does not take into account the potential nesting of
modules/symbol-tables where nested modules may require different
bufferization options than the parent module and functions may be called
from a parent module (if using call-like operations other than `func.call`).

To allow dynamic querying of bufferization options from modules, we create
a new interface, `BufferizationScopeOpInterface` that lets us attach
methods for producing OneShotBufferizationOptions to our custom module-like
operations.

This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering constraint
-- nested modules had to be bufferized prior to outer modules.

--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate <[email protected]>:

[executor] Fix Executor flatbuffer C++ generation cmake function

This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++ flatbuffer
files.

--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate <[email protected]>:

[executor] Bump Flatbuffers to v25.2.10

Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.

--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate <[email protected]>:

[executor] Set max Flatbuffer verification size to the 64bit limit

Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.

--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate <[email protected]>:

[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default

This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of `plan-bufferize`.
Certain integration tests for generation models won't work without it
currently.

--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate <[email protected]>:

[tensorrt] Add a size limit to constant folding in 'tensorrt-transpose-elimination'

Constant folding of transpose operations in the `tensorrt-transpose-elimination`
dialect can be extremely costly for large weights (e.g. embedding matrix).
We need to impose an upper bound on what constant folding we will allow
using naive transposition of DenseElementsAttrs.

--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate <[email protected]>:

[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination' pass

Add a simple statistic (only set when built with assertions enabled) that
counts the number of eliminated broadcast operations.

--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate <[email protected]>:

[tensorrt] Place an upper bound on constant folding computation

Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.

--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke <[email protected]>:

[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce` conversion to TensorRT

Previously, `stablehlo.reduce` conversion to TensorRT had the
following restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.

It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.

Stablehlo to TensorRT conversion tests are updated and a tensorrt
test is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.

--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate <[email protected]>:

[compiler] Fix failing `test_tensorrt_add.py` after importing OSS changes

Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.

--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate <[email protected]>:

NFC: [executor] cleanup TensorRTModule output allocator logic

Updates TensorRT output allocator logic to remove "reallocation" features
which were not usable. As we currently use it, the output allocator lazily
allocates new results for each invocation. In order to improve allocator
performance in the future, we should:

a) Pre-allocate when we know the static dimensions of the output
b) Represent the allocator in the IR so that it can be hoisted out of loops,
   allowing the outputs of TRT engines whose results don't escape the function
   to be re-used without an additional allocation.

Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>

GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Mar 5, 2025
Changes are listed oldest-to-newest:

0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate
<[email protected]>:

[cmake] Fix macro redefinition issues

When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.

It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779

In our case, it only has the effect of duplicating options and therefore
shouldn't be called.

--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate
<[email protected]>:

[cmake] Retire some build system logic that is no longer necessary

We used to try to do the "right" thing and make targets carry
compilation definitions, but we instead use CMake directory level
definitions now.

Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use
`LLVM_USE_LINKER`.

--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang
<[email protected]>:

[python/CAPI] Add Python API to get device name like cuda:0

1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp

The reason to only get index but not string in Runtime.cpp because
passing in/out string ptrs can be complicated (which may require
allocating the string buffer and estimating its size beforehand).

Test is added:
```
    devices = client.get_devices()
    print(devices[0].get_name())
```

--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao
<[email protected]>:

[executor][python]: Fix runtime API for creating float/integer scalars.

Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.

--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne
<[email protected]>:

[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins

Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.

Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.

Signed-off-by: Samurdhi Karunaratne <[email protected]>

--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate
<[email protected]>:

[cmake] Extensive updates to dependency management

Makes a big update to how dependencies are declared in the CMake code:

Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage to declare
their dependencies in some manner, and they do so via any number of
means, and the top-level project typically doesn't have any control over
that besides manually patching the upstream project's code.

To clean this all up into one coherent interface, we can declare a CMake
[Dependency
Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.

The dependency provider is a top-level CMake script which can only be
declared by the top-level project. It allows us intercept any and every
`find_package` or `FetchContent_MakeAvailable` call in the whole build,
including for dependencies.

Using this mechanism, in our core CMakeLists.txt, and in all the
dependencies' CMakeLists, we just need to use `find_package(<name> ...)`
to declare dependencies. The Dependency Provider then intercepts those
`find_package` calls and satisfies the depdency however we like.

To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded
immediately), and they are only actually downloaded and incorporated
into the build when the `find_package` call is made.

In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts to
CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field to
indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.

--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate
<[email protected]>:

[cmake] Fix library search directory for TensorRT dialect LIT tests

Fixes the search directory for dynamic libraries, resolving build
failure when BUILD_SHARED_LIBS=ON.

--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate
<[email protected]>:

[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build
issue

Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with
BUILD_SHARED_LIBS=ON. This change factors out a "base" library for the
TensorRT dialect that resolves the circular dependency issue.

--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke
<[email protected]>:

[compiler/lib/Conversion] Add support for `stablehlo.custom_call`
conversion

This MR adds a converter for `stablehlo.custom_call` op conversion to
TensorRT. Currently only valid conversion is when call target is
`mhlo.topk` and more possible conversions can be added later.
MLIR tests are added.

--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate
<[email protected]>:

NFC: Fix incorrect header in NvInferPluginUtils

Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).

--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate
<[email protected]>:

[executor] Update 'allocs-to-globals' pass to use DataLayout for type
size calculation

Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.

--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate
<[email protected]>:

[compiler] Cleanup pipeline declarations in Plan Dialect Transforms

This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations,
deallocation) into a single pipeline declaration that is used by the
different top-level pipelines (e.g. linalt-to-executable,
stablehlo-to-executable, and so on).

--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate
<[email protected]>:

[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression
tests

Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we are
producing sub-optimal code related to host constants and shape values.

--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate
<[email protected]>:

NFC: fix missing FileCheck command in test added in
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2

--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate
<[email protected]>:

[compiler] Use bufferization analysis in 'plan-alloc-tensors'

This change reworks the parts of the 'plan-alloc-tensors' pass that
tried to establish DPS connectivity in loop regions and at function
boundaries. We now use existing routines in the bufferization analysis
to support the transformation, which results in better results and more
simplified code. Additionally, a bug is closed where the transform could
potentially try modifying functions without fixing the callers if
inlining was not performed. We now explicitly check to make sure the
functions we are modifying do not have any callers and that we do not
traverse nested symbol tables. A future change can update the pass to
allow modifying functions with callers.

--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate
<[email protected]>:

[compiler] Add new module bufferization pass

This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler'
project.

The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.

We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like
operations outside of `func.func` and `func.call` (support for other
operations using the appropriate interfaces has had trouble landing
upstream). In addition, the upstream pass does not take into account the
potential nesting of modules/symbol-tables where nested modules may
require different bufferization options than the parent module and
functions may be called from a parent module (if using call-like
operations other than `func.call`).

To allow dynamic querying of bufferization options from modules, we
create a new interface, `BufferizationScopeOpInterface` that lets us
attach methods for producing OneShotBufferizationOptions to our custom
module-like operations.

This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering
constraint -- nested modules had to be bufferized prior to outer
modules.

--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate
<[email protected]>:

[executor] Fix Executor flatbuffer C++ generation cmake function

This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++
flatbuffer files.

--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate
<[email protected]>:

[executor] Bump Flatbuffers to v25.2.10

Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.

--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate
<[email protected]>:

[executor] Set max Flatbuffer verification size to the 64bit limit

Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.

--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate
<[email protected]>:

[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default

This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of
`plan-bufferize`. Certain integration tests for generation models won't
work without it currently.

--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate
<[email protected]>:

[tensorrt] Add a size limit to constant folding in
'tensorrt-transpose-elimination'

Constant folding of transpose operations in the
`tensorrt-transpose-elimination` dialect can be extremely costly for
large weights (e.g. embedding matrix). We need to impose an upper bound
on what constant folding we will allow using naive transposition of
DenseElementsAttrs.

--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate
<[email protected]>:

[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination'
pass

Add a simple statistic (only set when built with assertions enabled)
that counts the number of eliminated broadcast operations.

--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate
<[email protected]>:

[tensorrt] Place an upper bound on constant folding computation

Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.

--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke
<[email protected]>:

[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce`
conversion to TensorRT

Previously, `stablehlo.reduce` conversion to TensorRT had the following
restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.

It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.

Stablehlo to TensorRT conversion tests are updated and a tensorrt test
is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.

--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate
<[email protected]>:

[compiler] Fix failing `test_tensorrt_add.py` after importing OSS
changes

Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.

--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate
<[email protected]>:

NFC: [executor] cleanup TensorRTModule output allocator logic

Updates TensorRT output allocator logic to remove "reallocation"
features which were not usable. As we currently use it, the output
allocator lazily allocates new results for each invocation. In order to
improve allocator performance in the future, we should:

a) Pre-allocate when we know the static dimensions of the output b)
Represent the allocator in the IR so that it can be hoisted out of
loops,
allowing the outputs of TRT engines whose results don't escape the
function
   to be re-used without an additional allocation.

Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>

GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Build system in general and CMake in particular question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

3 participants