Release Technical Preview · sophgo/tpu-mlir

Features

Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
Introduced fx2mlir, a new functionality for enhanced MLIR conversion.
Implemented nnvlc2.0 and nnvlc1.0 local activation and weight operations, respectively, for improved neural network performance.
Enabled TPULANG support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
Added cv186x support in run_sensitive_layer.py and for the TDB, expanding compatibility and debugging capabilities.
Introduced new ops and features like Watchpoint in TDB and activation ops support for scale & zero_point, broadening the range of functionalities available in the tpu-mlir project.
Supports BM1690.
L2mem performs intermediate data exchange for active tensor.

Resolved a variety of bugs affecting backend processes, including issues with the 1684x backend, permutefuse2, permutemulconstswap, and more, improving overall stability and performance.
Fixed several critical issues across tpulang, including errors in sort_by_key operation, reshape operations, where operation, and more, enhancing the language's reliability for developers.
Addressed bugs in model processing, including fixes for concat logic, scale2conv, scale2conv3d, instance norm, and several more, ensuring smoother model optimization and execution.
Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Updated tpulang documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Optimized TDB and bmodel_checker for 1684x pcie mode, significantly reducing processing times and enhancing efficiency for model analysis.
Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
Enabled IO tag mode and refined address mode for better memory management and operational flexibility.