You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Changes are listed oldest-to-newest:
0751887ed36b20cad7d47568ecf35745368acfa9 by Christopher Bate
<[email protected]>:
[cmake] Fix macro redefinition issues
When building with with certain LLVM distributions, the
`HandleLLVMOptions` CMake script can set `LLVM_DEFINITIONS` to a
space-separated list of `-Ditem=value` definitions. When passed to
`add_definitions`, this does not have the desired effect.
It's actually not clear where `add_definitions(${LLVM_DEFINITIONS})`
really needs to be invoked. I filed a GitHub issue upstream to figure it
out: llvm/llvm-project#125779
In our case, it only has the effect of duplicating options and therefore
shouldn't be called.
--
9bebb5a25c5cd1b80bcdec931e990ca81d0c4e41 by Chris Bate
<[email protected]>:
[cmake] Retire some build system logic that is no longer necessary
We used to try to do the "right" thing and make targets carry
compilation definitions, but we instead use CMake directory level
definitions now.
Also, remove the `MLIR_TRT_USE_LINKER` flag since we just use
`LLVM_USE_LINKER`.
--
88d92a49c52dac9f4d044f09b24e0c750e5cd064 by Zixin Huang
<[email protected]>:
[python/CAPI] Add Python API to get device name like cuda:0
1. added mtrtDeviceGetIndex into Runtime.cpp Runtime.h.
2. added Device.get_name() into RuntimePyBind.cpp
The reason to only get index but not string in Runtime.cpp because
passing in/out string ptrs can be complicated (which may require
allocating the string buffer and estimating its size beforehand).
Test is added:
```
devices = client.get_devices()
print(devices[0].get_name())
```
--
7f106e7d501a7e0e93a54c4ec43afa0b4ad2d054 by Yuan Yao
<[email protected]>:
[executor][python]: Fix runtime API for creating float/integer scalars.
Previously, when using the `RuntimeClient.create_scalar` API, float
numbers were erroneously cast to an integer. Also an `i32` scalar was
always created as an `i64` scalar. This commit fixes these bugs.
--
44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e by Samurdhi Karunaratne
<[email protected]>:
[tensorrt/lib/Utils] Add splat dense attribute support for TRT plugins
Although dense attributes were supported, splat dense attributes were
not supported so far. When a splat attr is detected some space is
allocated inside `PluginParams` so that the PluginField sees a buffer
correctly filled with the splat value.
Also fixes a bug in tensorrt/test/lib/Target/PluginUtils.h when printing
integer plugin fields.
Signed-off-by: Samurdhi Karunaratne <[email protected]>
--
e6c291638dbe993360f0f10b4e3649862ac9333e by Chris Bate
<[email protected]>:
[cmake] Extensive updates to dependency management
Makes a big update to how dependencies are declared in the CMake code:
Previously, we used a set of ad-hoc functions to wrap CPMAddPackage.
Each dependency had its own function defined `add_[depname]` in
`Depependencies.cmake`. This was unwieldy and disorganized. Furthermore,
other CMake dependencies like Stablehlo also need to manage to declare
their dependencies in some manner, and they do so via any number of
means, and the top-level project typically doesn't have any control over
that besides manually patching the upstream project's code.
To clean this all up into one coherent interface, we can declare a CMake
[Dependency
Provider](https://cmake.org/cmake/help/latest/command/cmake_language.html#dependency-providers),
available in CMake 3.24+.
The dependency provider is a top-level CMake script which can only be
declared by the top-level project. It allows us intercept any and every
`find_package` or `FetchContent_MakeAvailable` call in the whole build,
including for dependencies.
Using this mechanism, in our core CMakeLists.txt, and in all the
dependencies' CMakeLists, we just need to use `find_package(<name> ...)`
to declare dependencies. The Dependency Provider then intercepts those
`find_package` calls and satisfies the depdency however we like.
To clean things up further, the default provider
(build_tools/cmake/DependencyProvider.cmake) now *declares* dependencies
at the top of the file (rather than forcing them to be downloaded
immediately), and they are only actually downloaded and incorporated
into the build when the `find_package` call is made.
In our case, `find_package` will cause `CPMAddPackage` to be invoked.
The `nv_register_package` is the declaration function, and it accepts to
CPMAddPackage arguments as well as a special "POST_ADD_HOOK" field to
indicate inline a script to be executed. The contents of
`nv_register_package` are simply saved and forwarded to `CPMAddPackage`
(less POST_ADD_HOOK) when `find_package` is invoked. Finally, after
`CPMAddPackage` returns, the `POST_ADD_HOOK` is executed.
--
c2fcfa3f2db54c4ab0737f4df504776b4b07f19c by Christopher Bate
<[email protected]>:
[cmake] Fix library search directory for TensorRT dialect LIT tests
Fixes the search directory for dynamic libraries, resolving build
failure when BUILD_SHARED_LIBS=ON.
--
9fe64b7d5075d615c520232cf79f60982d382ec1 by Christopher Bate
<[email protected]>:
[tensorrt] NFC: Factor out 'isTensorRTInt8Type' to fix circular build
issue
Recently, the addition of the new plugin support for TRT 10.8 caused a
circular dependency between the TensorRT Dialect library and the plugin
support library. This only became apparent when building with
BUILD_SHARED_LIBS=ON. This change factors out a "base" library for the
TensorRT dialect that resolves the circular dependency issue.
--
638abb11a731d48599ac2af349673e8324237931 by Sagar Shelke
<[email protected]>:
[compiler/lib/Conversion] Add support for `stablehlo.custom_call`
conversion
This MR adds a converter for `stablehlo.custom_call` op conversion to
TensorRT. Currently only valid conversion is when call target is
`mhlo.topk` and more possible conversions can be added later.
MLIR tests are added.
--
669d488b6b13521df2b54246f4fcd301ab6a8ace by Christopher Bate
<[email protected]>:
NFC: Fix incorrect header in NvInferPluginUtils
Point the include to TensorRTBase.h; otherwise we may be including
generated headers on which NvInferPluginUtils does not have a dependency
(and might not be generated yet in the build).
--
340a8a5e0888a8c881d469758f4c871f7233e976 by Christopher Bate
<[email protected]>:
[executor] Update 'allocs-to-globals' pass to use DataLayout for type
size calculation
Fixes an issue where 'allocs-to-globals' can crash if the the memref
element type does not have a canonical bitwidth, e.g. `complex<...>`.
Use the module's data layout to get the type byte size instead.
--
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2 by Chris Bate
<[email protected]>:
[compiler] Cleanup pipeline declarations in Plan Dialect Transforms
This change removes unused declarations from the Plan Dialect 'Passes.h'
header file and removes unused pipeline registrations. We consolidate
the three bufferization stages (bufferization, optimizations,
deallocation) into a single pipeline declaration that is used by the
different top-level pipelines (e.g. linalt-to-executable,
stablehlo-to-executable, and so on).
--
464616b4291de0dc80a6b6ce40e7ecd8a06c99fd by Christopher Bate
<[email protected]>:
[compiler] NFC: add some additional 'plan-bufferize-pipeline' regression
tests
Adds some additional regression tests for the end-to-end bufferization
and deallocation pipeline. This set of tests capture cases where we are
producing sub-optimal code related to host constants and shape values.
--
283bf821e4bc6f8a6b5a4f17509d87fcf72a0023 by Christopher Bate
<[email protected]>:
NFC: fix missing FileCheck command in test added in
6ca9b5da0d7bd672aaef7d2bc78a917d1ed159a2
--
bf050a70cdc1c714b5f1d782a043f889abf3272a by Christopher Bate
<[email protected]>:
[compiler] Use bufferization analysis in 'plan-alloc-tensors'
This change reworks the parts of the 'plan-alloc-tensors' pass that
tried to establish DPS connectivity in loop regions and at function
boundaries. We now use existing routines in the bufferization analysis
to support the transformation, which results in better results and more
simplified code. Additionally, a bug is closed where the transform could
potentially try modifying functions without fixing the callers if
inlining was not performed. We now explicitly check to make sure the
functions we are modifying do not have any callers and that we do not
traverse nested symbol tables. A future change can update the pass to
allow modifying functions with callers.
--
9f03c958cdee998d4c6ea059aa9125e395b68688 by Christopher Bate
<[email protected]>:
[compiler] Add new module bufferization pass
This change forks the upstream 'one-shot-module-bufferization' logic
into a 'module-bufferization' pass under the 'mlir-tensorrt/compiler'
project.
The upstream 'OneShotModuleBufferization' transformation is an extension
of the core one-shot-bufferization infrastructure that includes some
additional analysis state which is attached to functions. Functions are
analyzed to identify which arguments are read/written and additionally
which arguments are equivalent to which result values.
We require forking the upstream pass because the upstream transformation
does not currently allow for supporting functions and call-like
operations outside of `func.func` and `func.call` (support for other
operations using the appropriate interfaces has had trouble landing
upstream). In addition, the upstream pass does not take into account the
potential nesting of modules/symbol-tables where nested modules may
require different bufferization options than the parent module and
functions may be called from a parent module (if using call-like
operations other than `func.call`).
To allow dynamic querying of bufferization options from modules, we
create a new interface, `BufferizationScopeOpInterface` that lets us
attach methods for producing OneShotBufferizationOptions to our custom
module-like operations.
This new interface, together with the forked pass logic, allows us to
jointly bufferize multiple nested modules, whereas previously the
OneShotModuleBufferization limitations forced a phase-ordering
constraint -- nested modules had to be bufferized prior to outer
modules.
--
6fbed19b52415fc21ccc13807b1112886f405ef8 by Christopher Bate
<[email protected]>:
[executor] Fix Executor flatbuffer C++ generation cmake function
This change simply ensures that if the flatbuffer dependency version
changes that the targets which generate C++ from the flatbuffer schema
file will also go out-of-date, forcing re-generation of the C++
flatbuffer files.
--
cd8c672be49aaafc5d480d24f900a413f1effd59 by Christopher Bate
<[email protected]>:
[executor] Bump Flatbuffers to v25.2.10
Bumps the Flatbuffers version and migrates it to the new dependency
handling mechanism.
--
99f60bfe28d1247c060cc0be232ec73546785bb9 by Christopher Bate
<[email protected]>:
[executor] Set max Flatbuffer verification size to the 64bit limit
Since we use Flatbuffer's 64bit features, we need to explicitly enable
verification to allow 64bit offsets.
--
523da05c5720f395ce9c5815d2a44e82f51e924d by Christopher Bate
<[email protected]>:
[compiler] Change 'allow-return-allocs-from-loops' to 'true' by default
This allows us to handle loops that don't bufferize in-place during
'plan-module-bufferization'. This was the original behavior of
`plan-bufferize`. Certain integration tests for generation models won't
work without it currently.
--
a31e690e2a863aaf52528b8de1c66a2f1b1ea081 by Christopher Bate
<[email protected]>:
[tensorrt] Add a size limit to constant folding in
'tensorrt-transpose-elimination'
Constant folding of transpose operations in the
`tensorrt-transpose-elimination` dialect can be extremely costly for
large weights (e.g. embedding matrix). We need to impose an upper bound
on what constant folding we will allow using naive transposition of
DenseElementsAttrs.
--
3b7017e2356022db4ed21369422b2ead639b930b by Christopher Bate
<[email protected]>:
[tensorrt] NFC: add a statistic to the 'tensorrt-broadcast-elimination'
pass
Add a simple statistic (only set when built with assertions enabled)
that counts the number of eliminated broadcast operations.
--
6feeec4d85ab976ed16676795aa426a87f4a7f38 by Christopher Bate
<[email protected]>:
[tensorrt] Place an upper bound on constant folding computation
Places an upper bound of `1<<17` on the number of elements allowed for
constant folding of `DenseElementsAttrs` in the `fold` method of
TensorRT operations.
--
e8690a68101085e606746361db41533597ef3c1d by Sagar Shelke
<[email protected]>:
[Conversion/StablehloToTensorRT] Ease restrictions of `stablehlo.reduce`
conversion to TensorRT
Previously, `stablehlo.reduce` conversion to TensorRT had the following
restrictions.
- If reduction axes are not contiguous, conversion was unsupported.
- In case of multiple reduction axes, reduction happened through reshape
and reduction on single axis.
It turns out that this is too strict of a restriction. TensorRT supports
reduction on multiple as well as non-contiguous axis.
Stablehlo to TensorRT conversion tests are updated and a tensorrt test
is added. Stablehlo to TensorRT engine conversion path is tested for
reduction specific cases with both TensorRT 8.6 and 10.7.
--
af0c30a2d3943a8f8b5a80be49a7c0fc945ae07d by Christopher Bate
<[email protected]>:
[compiler] Fix failing `test_tensorrt_add.py` after importing OSS
changes
Fixes a test that was not failing after importing GitHub MLIR-TensorRT
changes.
--
5599299c5ce64aa0cafad411d90d7fe3bbda7f26 by Christopher Bate
<[email protected]>:
NFC: [executor] cleanup TensorRTModule output allocator logic
Updates TensorRT output allocator logic to remove "reallocation"
features which were not usable. As we currently use it, the output
allocator lazily allocates new results for each invocation. In order to
improve allocator performance in the future, we should:
a) Pre-allocate when we know the static dimensions of the output b)
Represent the allocator in the IR so that it can be hoisted out of
loops,
allowing the outputs of TRT engines whose results don't escape the
function
to be re-used without an additional allocation.
Co-authored-by: Sagar Shelke <[email protected]>
Co-authored-by: Zixin Huang <[email protected]>
Co-authored-by: Yuan Yao <[email protected]>
Co-authored-by: Samurdhi Karunaratne <[email protected]>
GitOrigin-RevId: 4c4fdae1e9ae089df09e61abf06be28adc7217ca
0 commit comments