Skip to content

Releases: sophgo/tpu-mlir

v1.6-beta.0

29 Jan 13:39
Compare
Choose a tag to compare
v1.6-beta.0 Pre-release
Pre-release

New Features

  • Implemented SG2260 structureOp interface and structured transform, including a solver for finding transforms【ea234bc2†source】.
  • Added OneHot converter and support for fp8 in the debugger【c03ba46c†source】【f87127bd†source】【fed7e68a†source】.
  • Supported MatMulOp for special cases broadcast in batch dims and added interface for attention【90d4b327†source】【044c4fc3†source】.
  • Provided "decompose linalg op" and "tile+fuse" pass for MatMul parallel supports more batch patterns【25f24e3d†source】.
  • Unet single block test added【ea76f9c9†source】.
  • Implemented fp8 support for Matmul and other ops including addconst, subconst, mul, add, sub, and abs【e09adbda†source】【7eaec57f†source】.

Performance Improvements

  • Improved Matmul fp8 performance with new backend support【2b8dd03b†source】.
  • Enabled distribute MLP and attention with improved performance for cascade_net input/output names and order【d5a42d7a†source】.
  • Refactored tdb to improve disassembler serialize and resolve BM1688 decoding issue【e73450f8†source】【1457df29†source】.
  • Improved weight reorder for ConvOp and optimized permute of attention matmul【a9045c3c†source】【91a353e3†source】.

Bug Fixes

  • Resolved various bugs in MatMul, Conv, and other ops across multiple chipsets including SG2260, BM1688, and CV18xx【b809a8c1†source】【bfada4de†source】【9804e30c†source】.
  • Fixed bugs related to ReduceOp, ArgOp, SliceOp, and others for better operation and tensor handling【2cdeb60d†source】【bbacf00f†source】.
  • Addressed issues in SAM, daily test, and tdb related to core operations and functionality【83e1979c†source】【7c37e39d†source】.
  • Fixed memory and data handling bugs for more accurate and stable operation of the models【2310cd8d†source】【0ed60f1f†source】.

Documentation Updates

  • Updated documentation to remove sensitive words and improve clarity and comprehensiveness【43e0b428†source】【5d6c49fc†source】.

Miscellaneous

  • Enhanced various backend libraries and supported new ops and patterns for more efficient and versatile model handling【1ca95d71†source】【8f1a2de8†source】.
  • Improved scatterE and reduce dynamic shape_value handling for better model optimization【fa2ccf29†source】.
  • Refinements in graph optimization, permute parallel indexMapping, and related areas for improved model processing【094f05da†source】【1ec6c16b†source】.

Technical Preview

03 Nov 10:00
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

TPU-MLIR Project Update

Bug Fixes and Dependency Updates

  • Fix Dependency: Fixed the dependency of MLIRInputConversion.
  • SDK Release Workflow: Fixed tpu-mlir tag for building and added workflow file for SDK release.
  • Softplus LoweringINT8: Fixed 1684 Softplus LoweringINT8 issue.
  • Slice Begin Index: Fixed bm1684 slice begin_index problem.
  • Mul Conflict Resolution: Partially fixed the output data sign of mul conflict with chip restriction.

Feature Enhancements and Support

  • Subgraph Split Support: Enhanced support for subgraph split.
  • Quant IO List Note: Added quant io list note for better quantization handling.
  • New Full Operation: Supported the aten::new_full operation.
  • Torch Flip for bm1684x: Added support for torch.flip for bm1684x.
  • Weight Input Shape Bind: Supported shape bind for weight input.

Updates and Implementations for Specific Operations

  • Backend Update for sg2260: Updated sg2260 for backend for tag31.
  • ScatterElements Implementation: Implemented ScatterElements for any axis.
  • Unary Indexing Map: Added unary indexing map.
  • Binary Indexing Map: Added binary (add/sub/mul/div/min/max) indexing map.
  • Dynamic NMS Support: Featured support for dynamic nms for bm1684x.

Codebase and Documentation Refinements

  • Cleanup: Removed test/sg2260 dialect.
  • Documentation Update: Updated nntoolchain README and lib.
  • Codegen Documentation: Added documentation for codegen.
  • Template Format Update: Updated import mlir file template format.
  • Quick Start Docs Modification: Modified quick start docs for tpu-mlir.

Optimizations and Performance Improvements

  • Kernel Module Usage: Reverted to using the old kernel module.
  • MLIR Conv2D Optimization: Improved 1684 mlir conv2d with 3ic optimization.
  • SWINT Quantization: Added swint quant for better performance.
  • Opt Parameter Addition: Added an optimization parameter.
  • Loop and Fusion Enhancements: Supported interchange of inner loop, padOp transform, tensor op collapse, fusion on linalg-on-tensor, etc.

Technical Preview

27 Sep 13:50
Compare
Choose a tag to compare

🐳 Docker Image Update

Changed required Docker image from sophgo/tpuc_dev:v2.2 to sophgo/tpuc_dev:v3.1, which is based on Ubuntu 22.04.

📖 Documentation

Updated docs to add more parameters in model deployment.

🐛 Bug Fixes

Fixed TPU-MLIR dialect Python binding for DEBUG mode.
Resolved backward training bug.
Addressed average pooling and max pooling issues.
Several other bug fixes related to Winograd inference, training, and more.

🚀 Feature Additions

Added Deconv3D backend support.
Support for dynamic tile added for bm1684x.
Added Winograd feature.
Several other feature additions, including dual-core support in debugger, MatMulSliceMerge support for int8/int4, and more.

🔧 Code Maintenance

Code renaming and cleaning.
Regression adjustments and tests.

⚙️ Backend Updates

Backend updates for various architectures including BM1684 and sg2260.

Technical Preview

21 Aug 14:23
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

New Features and Enhancements

  • Support for Various Operations: Added support for exp, erf, gelu, loopop, and other operations for specific platforms.
  • Tooling and Visualization: Renamed profile.py, added visual tools for weights, and enhanced debugging capabilities.
  • Model Support and Adjustments: Added daily release models, scripts, and support for specific model types like yolov8, yolov4s.
  • Distribution and Multicore Support: Implemented distribution steps, multicore support, and group convolution transformation.

Bug Fixes and Resolutions

  • Model and Parsing Fixes: Resolved issues in emvd models, parsing errors, slice bugs, and fixed specific issues in bm1684 and bm1686.
  • Codegen and Canonicalization Fixes: Addressed type errors, canonicalization failures, and operand kind checks.
  • Inference and Optimization Fixes: Fixed inference issues in max, where, and slice operations, and refined canonicalization.

Documentation and Cleanup

  • Documentation Updates: Refined tpu-mlir docs, added supposed ops document, and updated specific documents.
  • Code Cleanup and Refactoring: Removed unnecessary code, reconstructed permute move canonicalization, and prepared for LLVM upgrade.

Other Changes

  • Testing and Calibration: Added test cases, calibration updates, and support for regression and tag in TDB.
  • Backend and Runtime Adjustments: Updated backend, added support for auto-increase op, and fixed minor bugs.

Technical Preview

26 Jul 09:25
Compare
Choose a tag to compare

Features:
BM1686: support post handle op, provided parallelOp codegen, add DivOp for f16/bf16.
BM1684: Support dynamic compilation load tensor from L2mem, implement GROUP_3D local layer function, support more dynamic ops, like MinConst, MaxConst, Lut; and some static ops, like deform_conv2d.
CV18XX: Support more ops like equalOp.
Support IfOp for f16/bf16/int8 mode.
Implement post process function of sensitive layer, unranked tensor and dynamic tensor at frontend, add empty and baddbmm torch converter/interpreter.
Support weight split when layer group if op is broadcastbinary, suppoprt parse ops of each layer in top.mlir, support int32 to i/u8 inference for modeol_runner.py.
Remove onnx-sim and use unranked_type for all ops.
Implement more graph opimize: merge matmul + add to matmul if float type, fuse same operation pass, weight trans when permute+add.
Support more torch ops, like rmsnorm, ceil, remainder.
Other new operations: lowering of GatherElements, multi-input Add.

Bug Fixes:
Fix chatglm2 rmsnorm untransformed prob, ScaleOp inference error, bmodel_dis format bin, shape inference of matmul, subnet output order mismatch cause error in dynamic runtime.
Avoid duplicate name of inserted CastOp, distinguish caffe matmul shape.

Code Refactoring:
Use llvm::md5, llvm::sha256.
Use Clang to speed up code compilation.
Remove some unused header files.
Use rewriter.eraseOp instead of op->earse, use string to define padding mode.
Refine disassembler, refactor mix_precision.

Documentation Updates:
Update document version and change some model-zoo requirements.
Modified English part and modified developer_manual doc for visual.py part.

Testing and Verification:
Updated list of test models supported by BM1684X.

Technical Preview

19 Jun 03:40
Compare
Choose a tag to compare

Features:
Supported 'Conv3D', 'Pool3D', 'Pow2(n^x)', 'Softplus', 'GRU', 'Scale' for BM1684, more models available like wenet-encoder.
Supported some operations like 'DictConstruct', 'Sub', 'Ones_like', 'Zeros_like', 'ChannelShuffle', 'Activation', 'Conv3d', 'Compare', 'GroupNorm', 'InstanceNorm', 'Clamp' in PyTorchConverter.
New ONNX operations in OnnxConverter, like 'GridSample', 'CompareCst'.
Supported more dynamic more operations like 'Arg', 'Active', 'Reduce', 'Min', 'Max' for BM1684.
Add depth2space to backward pass, 1684x yolov5 postprocess, CopyMultiUseWeight pattern before shape_infer.
Improved the previous subnets's type check logic, add some parallel in learning quant.

Bug Fixes:
Running functions have improved and fixed: weight display problem in visual tool, model_deploy -- test_reference is none.
BM1684: fix8b large dilation weight reorder, MulConst, AddConst, SubConst local buffer size, mulshift local buffer.
BM1684X: 5dim broadcast add, attention and utest bug, scatternd support 5dim, YoloDetection inference bug, strideslice op need begin_mask/end_mask for dynamic shape.
CV18XX: fix gray fuse preprocess, fix TgScaleLutKernel pass.
OnnxConverter: convert_add_op fix broadcast channel when r_dim is 1, infer subgraph to get shape and fix attr:'axes' not in squeeze.
Others: fix sdk demo problem, hanging prob caused by assert in cmodel, group overlap tensor id error, fix python array with random data.

Code Refactoring:
Redesigned subnet splitting, sorting, merging and running order.
Refine 18xx codegen, conv quantization, gather lowering and debugger's dictionary-structure.
Rename bdc to tiu.
Reset pattern of onnx subconst op.
Simplify layernorm to single output.

Documentation Updates:
Fix quick_start typo.
Update yolov3_tiny output_names.
Refine yolov5 postprocess chapter, cv18xx quick start doc.

Testing and Verification:
Update yolov3 regression test, bayer2RGB model sample, squeezenet_v1.1_cf.
Save a copy of bert_base 2.11 version config for cali.
Add timeout checkout and model test timeout for test.
Add many cv18xx model regression.
Align cv18xx detect samples and YOLODetection Func.

Technical Preview

29 May 04:33
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Features:

Added a feature called "bmodel_checker", which aids in checking the correction and functionality of the BModels.
Supported LSTM (Long Short-Term Memory) for bm1684, indicating improved capabilities for handling sequence data.
Added support for the ONNX Loop operation, expanding the range of operations that can be performed using the ONNX format.
Implemented support for operations like 'stack', 'new_zeros', 'new_ones' in PyTorch.
Added a new visual tool for analyzing the parameters or operation of the models.
Added support for TensorFlow's MobileBert model.

Bug Fixes:

Fixed a bug related to 'decode lmem address', which might have caused issues in decoding addresses.
Addressed the 'incomplete onnx shape info' bug, improving the reliability of using ONNX format models.
Resolved an issue with 'single thread of int4 regression test', enhancing the testing suite.
Fixed the 'group deconv' and 'deconv1d' issues, optimizing the performance of deconvolution operations.
Resolved an error in the ArgError[18xx] case in 'test_onnx.py'.
Corrected an issue causing MulConst overflow in certain cases.

Code Refactoring:

Refactored BModel_dis to make it more efficient or easier to understand.
Unified the codegen pass to simplify the code generation process.
Revised the argument structure of bmodel_checker for more logical and intuitive use.
Modified the PermutePadSwap function to accommodate more situations.
Refined memory usage for large models, improving efficiency and performance.
Removed unused files and refactored main_entry, run_model, and cfg files for more streamlined execution.

Documentation Updates:

Updated the README file to provide up-to-date information.
Synced with model-zoo to maintain the relevance of documentation.
Added a description for the visual tool parameter.
Added information on mlir precision test and target in the documentation.
Updated the quick start guide for PyTorch.
Added more detailed information about the new bmodel_checker tool and Tensor Location in the documentation.

Testing and Verification:

Added an inference test for 'stable diffusion.'
Added regression tests for ONNX on the 1684 chip.
Fixed an issue in the ArgError[18xx] case in 'test_onnx.py', improving the ONNX testing suite.
Added an operation regression test for Athena2.
Added a test for 'stable diffusion' to ensure its proper functionality.

Technical Preview

20 May 18:18
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Features:

  • Added a feature called "bmodel_checker", which aids in checking the correction and functionality of the BModels.
  • Supported LSTM (Long Short-Term Memory) for bm1684, indicating improved capabilities for handling sequence data.
  • Added support for the ONNX Loop operation, expanding the range of operations that can be performed using the ONNX format.
  • Implemented support for operations like 'stack', 'new_zeros', 'new_ones' in PyTorch.
  • Added a new visual tool for analyzing the parameters or operation of the models.
  • Added support for TensorFlow's MobileBert model.

Bug Fixes:

  • Fixed a bug related to 'decode lmem address', which might have caused issues in decoding addresses.
  • Addressed the 'incomplete onnx shape info' bug, likely improving the reliability of using ONNX format models.
  • Resolved an issue with 'single thread of int4 regression test', enhancing the testing suite.
  • Fixed the 'group deconv' and 'deconv1d' issues, optimizing the performance of deconvolution operations.
  • Resolved an error in the ArgError[18xx] case in 'test_onnx.py'.
  • Corrected an issue causing MulConst overflow in certain cases.

Code Refactoring:

  • Refactored BModel_dis to make it more efficient or easier to understand.
  • Unified the codegen pass to simplify the code generation process.
  • Revised the argument structure of bmodel_checker for more logical and intuitive use.
  • Modified the PermutePadSwap function to accommodate more situations.
  • Refined memory usage for large models, improving efficiency and performance.
  • Removed unused files and refactored main_entry, run_model, and cfg files for more streamlined execution.

Documentation Updates:

  • Updated the README file to provide up-to-date information.
  • Synced with model-zoo to maintain the relevance of documentation.
  • Added a description for the visual tool parameter.
  • Added information on mlir precision test and target in the documentation.
  • Updated the quick start guide for PyTorch.
  • Added more detailed information about the new bmodel_checker tool and Tensor Location in the documentation.

Testing and Verification:

  • Added an inference test for 'stable diffusion.'
  • Added regression tests for ONNX on the 1684 chip.
  • Fixed an issue in the ArgError[18xx] case in 'test_onnx.py', improving the ONNX testing suite.
  • Added an operation regression test for Athena2.
  • Added a test for 'stable diffusion' to ensure its proper functionality.
  • Fixed the issue with the daily build test, ensuring a more reliable continuous integration pipeline.

Technical Preview

02 Apr 10:13
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  1. Lots of bug fixes and performance improvements.
  2. TPU-MLIR supports importing Pytorch models (no need to convert to ONNX).
  3. Unified pre-processing for bm168x and cv18xx chips.
  4. Support for the bm1684 chip is underway.

Technical Preview

20 Mar 08:29
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  • Resolved pre-processing performance issues.
  • Added shape inference for dynamic input shapes.
  • Implemented constant folding to simplify the graph.
  • Improved performance, still working on optimizations.