Skip to content

Technical Preview

Pre-release
Pre-release
Compare
Choose a tag to compare
@luluman luluman released this 01 Apr 09:16
· 800 commits to ind_release since this release

Features

  • Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
  • Introduced fx2mlir, a new functionality for enhanced MLIR conversion.
  • Implemented nnvlc2.0 and nnvlc1.0 local activation and weight operations, respectively, for improved neural network performance.
  • Enabled TPULANG support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
  • Added cv186x support in run_sensitive_layer.py and for the TDB, expanding compatibility and debugging capabilities.
  • Introduced new ops and features like Watchpoint in TDB and activation ops support for scale & zero_point, broadening the range of functionalities available in the tpu-mlir project.
  • Supports BM1690.
  • L2mem performs intermediate data exchange for active tensor.

Bug Fixes

  • Resolved a variety of bugs affecting backend processes, including issues with the 1684x backend, permutefuse2, permutemulconstswap, and more, improving overall stability and performance.
  • Fixed several critical issues across tpulang, including errors in sort_by_key operation, reshape operations, where operation, and more, enhancing the language's reliability for developers.
  • Addressed bugs in model processing, including fixes for concat logic, scale2conv, scale2conv3d, instance norm, and several more, ensuring smoother model optimization and execution.
  • Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Documentation Updates

  • Updated tpulang documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Performance Improvements

  • Optimized TDB and bmodel_checker for 1684x pcie mode, significantly reducing processing times and enhancing efficiency for model analysis.
  • Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
  • Enabled IO tag mode and refined address mode for better memory management and operational flexibility.