Releases · sophgo/tpu-mlir

05 Nov 07:30

github-actions

v1.12

f732432

v1.12 Latest

Latest

Features

Support for backend operators implemented using PPL.
TPUv7-runtime CModel integrated with TPU-MLIR for BM1690 model CModel inference.
Optimized inference speed for BM1690 Stable Diffusion 3.0 at 512 resolution to 0.72 img/s (Mac utilization: 41.9%).
Support for training graph compilation of ResNet50-v1 through FxGraphConverter.

Bug Fixes

Performance: Fixed the issue of performance degradation in SegNet.
Functionality: Resolved the compilation comparison issue for BM1688 DeppLabv3P.

Known Issues

Performance: Slight performance degradation observed in BM1690 YOLOv5-6 with 4 batch INT8 on eight cores.

Assets 8

25 Oct 10:18

github-actions

v1.12-beta.0

85c9177

v1.12-beta.0

combine slice and concate to new Rope ConcatToRope

Change-Id: Ib15b12fe97117b96c6fe7267c96c3f714aac6ec4

Assets 8

27 Sep 05:30

github-actions

v1.11

029d83a

v1.11

[python] distinguish data path model-zoo from regression

Change-Id: I98fa0df1524f0b38d91cda02ab5d49876f7caee8
(cherry picked from commit fa082d0b29df8a82af77839df86349aabab86949)

Assets 8

18 Sep 09:02

github-actions

v1.11-beta.0

3a49702

v1.11-beta.0

[soc_dump] add doc

Change-Id: Icaf313113415a9bf0ad9c75abdcb609d661c815b

Assets 8

15 Aug 05:02

github-actions

v1.10

d9ce48d

TPU-MLIR v1.10 Release

Release Note

Enhancements:

Added CUDA support for various operations like conv2d, MatMul, dwconv, pool2d, and more.
Improved performance for operations like MeanStdScale and softmax.
Enhanced multi-core batch mm and added support for bm168x with CUDA.
Refined CUDA code style and adjusted interfaces for various operations.

Bug Fixes:

Fixed issues with matmul, calibration failures, conv pad problems, and various performance problems.
Addressed bugs in model transformations, calibration, and various pattern issues.
Resolved bugs in different model backends like ssd, vit, detr, and yolov5.

New Features:

Added support for new models like resnet50, mobilenet_v2, shufflenet_v2, and yolox_s/alphapose_res50.
Introduced new operations like RequantIntAxisOp and Depth2Space with CUDA support.
Implemented new functionalities for better model inference and compilation.

Documentation Updates:

Updated weight.md, calibration sections, and user interface details.
Improved documentation for quick start, developer manual, and various tpulang interfaces.
Enhanced documentation for model transformation parameters and tensor data arrangements.

Miscellaneous:

Added new npz tools, modelzoo regression, and support for bmodel encryption.
Fixed issues with various model performance, shape inference, and CUDA backend optimizations.
Revived performance for models like yolov5s-6, bm1690 swin multicore, and more.

Assets 8

15 Jul 14:40

github-actions

v1.9

3fe7a13

TPU-MLIR v1.9 Release

Release Note

Enhancements:

Implemented output order preservation in converters like ONNX, Caffe, Torch, and TFLite.
Added support for resnet50-v2 bm1690 f8 regression.
Improved ILP group mlir file sequences for resnet50 training.
Updated chip libraries and performance AI for A2 profiling.
Added a new dump mode "COMB" and refined abs/relu conversions.

Bug Fixes:

Fixed issues with preprocess when source layout differs from target layout.
Addressed bugs in various operations like softmax, concat, and weight reorder in conv2d.
Resolved bugs in model training, model transformation, and various pattern issues.
Fixed bugs related to CUDA inference, matmul with bias, and multi-output calibration.

New Features:

Added support for multi-graph in TPULang.
Introduced new options in TPULang for inference and model deployment.
Implemented various optimizations and enhancements for dynamic operations and model transformations.

Documentation Updates:

Refined documentation for quick start quantization and user interface sections.
Updated backend information, docker image download methods, and model deployment details in the documentation.

Miscellaneous:

Improved performance for various models like vit, yolov5s, and bm1690.
Introduced new functionalities like embedding multi-device slice and groupnorm train operations.
Added support for adaptive_avgpool inference and multiple Einsum modes.

Assets 8

12 Jul 09:27

github-actions

v1.8.1

f411c2b

TPU-MLIR v1.8.1

Full Changelog: v1.8...v1.8.1

Assets 8

29 May 11:15

github-actions

v1.8

a085169

TPU-MLIR v1.8 Release

Highlights:

Enhancements:
- Added support for dynamic shape inference in various operations.
- Optimized core operations for better performance on specific models.
- Improved backend support for multiple models like BM1684X, BM1688, BM1690, SG2380, etc.
- Introduced new operations and patterns for more efficient model processing.
- Updated documentation for better clarity and user guidance.
Bug Fixes:
- Resolved issues related to input/output handling, kernel configurations, and model-specific bugs.
- Fixed bugs in dynamic compilation, core parallel processing, and various backend operations.
- Addressed errors in specific model post-processing steps like YOLOv5, EfficientNet, etc.
Performance Improvements:
- Optimized cycle calculations for multi-core models.
- Enhanced bandwidth usage statistics for better resource management.
- Accelerated compilation processes for training models using a new layer-group scheme.
New Features:
- Introduced new operations like attention quant block, prelu op, and various dynamic compile features.
- Added support for additional operations, weight location, and dynamic compile enhancements.

Documentation Updates:

Updated developer manuals, quick start guides, and model-specific documentation for better understanding.

Miscellaneous:

Streamlined workflows for faster commit checks and improved debugging processes.
Added new test cases for regression testing and script-based model evaluations.
Fine-tuned backend operations for improved model performance and accuracy.

Assets 8

19 Apr 09:58

charlesxzb

v1.7

5598736

TPU-MLIR v1.7 Release

Change Log

New Features

Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
Added --dump_dataframe option for bmodel_checker and support for transpose with order [1, 2, 3, 0].
Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
Added new patterns for Cswin and Einsum operations.
Improved support for LLM (Large Language Models) in bm1688.

Bug Fixes

Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
Addressed logical issues in AddToScale pattern and issues in fp_forward.
Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Performance Improvements

Improved the performance of TDB and the bmodel_checker for 1684x pcie.
Optimized facenet and fixed performance issues of 1688 multicore.
Enabled single-core mode optimizations where necessary.

Documentation and Testing

Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
Fixed various documentation errors and updated the release note.

Other Changes

Added restrictions to tpulang ops and net test cases.
Adjusted descriptions and refined interfaces for better user experience.
Updated backend .so files and addressed sensitive words in the codebase.
Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.

Assets 8

01 Apr 09:16

luluman

v1.7-beta.0

af1ab50

Technical Preview Pre-release

Pre-release

Features

Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
Introduced fx2mlir, a new functionality for enhanced MLIR conversion.
Implemented nnvlc2.0 and nnvlc1.0 local activation and weight operations, respectively, for improved neural network performance.
Enabled TPULANG support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
Added cv186x support in run_sensitive_layer.py and for the TDB, expanding compatibility and debugging capabilities.
Introduced new ops and features like Watchpoint in TDB and activation ops support for scale & zero_point, broadening the range of functionalities available in the tpu-mlir project.
Supports BM1690.
L2mem performs intermediate data exchange for active tensor.

Bug Fixes

Resolved a variety of bugs affecting backend processes, including issues with the 1684x backend, permutefuse2, permutemulconstswap, and more, improving overall stability and performance.
Fixed several critical issues across tpulang, including errors in sort_by_key operation, reshape operations, where operation, and more, enhancing the language's reliability for developers.
Addressed bugs in model processing, including fixes for concat logic, scale2conv, scale2conv3d, instance norm, and several more, ensuring smoother model optimization and execution.
Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Documentation Updates

Updated tpulang documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Performance Improvements

Optimized TDB and bmodel_checker for 1684x pcie mode, significantly reducing processing times and enhancing efficiency for model analysis.
Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
Enabled IO tag mode and refined address mode for better memory management and operational flexibility.

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Features

Bug Fixes

Documentation Updates

Performance Improvements

Releases: sophgo/tpu-mlir

v1.12

v1.12-beta.0

v1.11

v1.11-beta.0

TPU-MLIR v1.10 Release

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

TPU-MLIR v1.9 Release

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

TPU-MLIR v1.8.1

TPU-MLIR v1.8 Release

TPU-MLIR v1.7 Release

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Technical Preview

Features

Bug Fixes

Documentation Updates

Performance Improvements