Skip to content

Releases: sophgo/tpu-mlir

v1.15.2

24 Feb 10:21
Compare
Choose a tag to compare
whl release bug fix

fix the inconsistent dependent packages version between the release package and the docker image

Change-Id: I7d61dafebf10d8e74799a569dfda65173b13ed2e

v1.15.1

08 Feb 03:38
Compare
Choose a tag to compare
fix convbwd,bnbwd bug

fix convbwd and bnbwd bug for resnet train

Change-Id: I0ea22cf44cbe54220423e7fee5746f30ff329f6d

v1.15

05 Feb 04:53
Compare
Choose a tag to compare
update pypi release script

update deprecated v3 artifact upload/download to v4

Change-Id: I0c67c4bc093413f16d349fe97d65b37f42f26580

v1.15-beta.0

20 Jan 13:09
Compare
Choose a tag to compare
feat: make lmem assignment stage more analyzable

- define some commonly used LOG macro (Logger.h)
- define some strinify function to show lmem type and timestep mode
  (LayerGroupDefs.h)
- add show_timestep_table to print readable timestep table
  (BasicTimeStep.h/BasicTimeStep.cpp)
- add many DEBUG_WITH_TYPE logs and comments in lmem assignment stage
  (BasicTimeStep.cpp/LmemAllocator.cpp/TimeStepMethod.cpp/SwPipeline.cpp)
- rename some variables and function names for better represent the
  process(gen_all_mem_buffer_ts/tgt_min_address/...)
- reduce assignLmemAddr cyclomatic complexity.(LmemAllocator.cpp:989)

Change-Id: I31dadb9424be334da481f9dfbd45985ca89dc058

v1.14

31 Dec 09:16
Compare
Choose a tag to compare
[doc] refine user interface

Change-Id: I887ad481b2a3b1f7dce4fe993399ec2afa093bb4

v1.14-beta.0

24 Dec 03:45
Compare
Choose a tag to compare
fix bug in build ppl

Change-Id: Ib93341da7fa6b420f9fb9cd9e4b61dc21aeaf001

v1.13

01 Dec 04:53
Compare
Choose a tag to compare
add a16 matmul multi_core

Change-Id: I10a9097ee52e324555f4a505ce18d7fe9b665803

v1.13-beta.0

22 Nov 13:12
Compare
Choose a tag to compare
[doc] layergroup opt intro

Change-Id: I0797b73e4d020e9556da29d1c1a743b8c80a83ad

v1.12

05 Nov 07:30
Compare
Choose a tag to compare

Features

  • Support for backend operators implemented using PPL.
  • TPUv7-runtime CModel integrated with TPU-MLIR for BM1690 model CModel inference.
  • Optimized inference speed for BM1690 Stable Diffusion 3.0 at 512 resolution to 0.72 img/s (Mac utilization: 41.9%).
  • Support for training graph compilation of ResNet50-v1 through FxGraphConverter.

Bug Fixes

  • Performance: Fixed the issue of performance degradation in SegNet.
  • Functionality: Resolved the compilation comparison issue for BM1688 DeppLabv3P.

Known Issues

  • Performance: Slight performance degradation observed in BM1690 YOLOv5-6 with 4 batch INT8 on eight cores.

v1.12-beta.0

25 Oct 10:18
Compare
Choose a tag to compare
combine slice and concate to new Rope ConcatToRope

Change-Id: Ib15b12fe97117b96c6fe7267c96c3f714aac6ec4